Registry Review MCP Server

Overview Schema Related Servers Score Discussions

task-34 - Evidence-matrix-and-validation-reliability.md•6.51 kB

--- id: task-34 title: Evidence matrix and validation reliability overhaul status: Done assignee: [] created_date: '2025-12-04' updated_date: '2025-12-04' labels: - reliability - ux - data-quality - high dependencies: [] priority: high --- ## Description  Investigation revealed critical gaps between what the system displays and what it actually validates. ChatGPT is inventing data when asked to show matrices because the API doesn't provide complete, structured responses. ### Root Cause Analysis #### Issue 1: Cross-Validation Runs Zero Checks From `validation.json` for a "completed" session: ```json { "total_validations": 0, "validations_passed": 0, "pass_rate": 0.0, "extraction_method": "llm" } ``` The system says "all good" but ran **zero actual validations**. This happens because: 1. `cross_validate()` attempts LLM extraction for dates, tenure, project IDs 2. LLM extraction returns insufficient or empty data 3. Without extractable fields, no validation checks run 4. "pass_rate: 0.0" is interpreted as "no failures" rather than "nothing checked" **Impact**: Users think their documents passed validation when nothing was actually checked. #### Issue 2: ChatGPT Invents Matrix Data When user asked "Can you show me a matrix", ChatGPT fabricated: - A table with 23 requirements categorized as "Auto-Validated" vs "Manual Review Needed" - 5 "safeguard" requirements marked as needing manual review - Citations that don't match actual evidence.json data **Root cause**: The API returns raw data without a structured matrix view. ChatGPT fills the gap with hallucinated content based on its training data. #### Issue 3: Evidence Data Exists But Isn't Displayed The `evidence.json` DOES contain proper citations: ```json { "text": "Credit Class Name: GHG Benefits...", "document_id": "DOC-f35bd7ab", "document_name": "4997Botany22 Public Project Plan.pdf", "page": 3, "section": "1.3. Credit Class and Methodology", "confidence": 0.95 } ``` But no endpoint returns this as a **standardized evidence matrix** with columns: - Requirement ID - Source Document - Page Number - Evidence Text - Confidence Score - Validation Status #### Issue 4: No Requirement Classification Requirements aren't classified by validation type: - **Auto-validatable**: dates, project IDs, tenure (cross-document consistency checks) - **Human-judgment**: safeguards, stakeholder consultation, risk assessment Without this classification, the system can't report which requirements were machine-checked vs which need human review. ### Proposed Solutions #### Solution 1: Create Evidence Matrix Endpoint Add `/sessions/{session_id}/evidence-matrix` that returns: ```json { "session_id": "session-xxx", "matrix": [ { "requirement_id": "REQ-001", "category": "General", "description": "Latest methodology version applied", "evidence": [ { "source_document": "4997Botany22 Public Project Plan.pdf", "page": 3, "text": "Methodology Version: 1.1", "confidence": 0.95 } ], "validation_type": "auto", "validation_status": "passed", "human_review_required": false } ], "summary": { "total_requirements": 23, "auto_validated": 18, "pending_human_review": 5, "coverage": 1.0 } } ``` This gives ChatGPT structured data it can display directly without inventing content. #### Solution 2: Add Requirement Classification to Checklist Extend checklist schema: ```json { "requirement_id": "REQ-019", "category": "Safeguards", "validation_type": "human_judgment", "auto_validatable": false, "cross_validate_fields": [] } ``` vs: ```json { "requirement_id": "REQ-002", "category": "Land Tenure", "validation_type": "auto", "auto_validatable": true, "cross_validate_fields": ["owner_name", "area_hectares", "tenure_type"] } ``` #### Solution 3: Fix Cross-Validation Empty Results When `total_validations: 0`, the system should: 1. Report which requirements COULD NOT be auto-validated 2. Explain WHY (missing structured data) 3. Route those to human review queue Change validation summary from: ``` "Pass rate: 100% (no flags)" ``` To: ``` "Checked: 5/23 requirements (auto-validatable) Passed: 5/5 (100%) Pending human review: 18/23 (qualitative requirements)" ``` #### Solution 4: Standardize Data Matrix Responses All matrix/table endpoints should include these columns: 1. **Identifier** (requirement_id, document_id) 2. **Description** (brief text) 3. **Source** (document name) 4. **Page** (page number) 5. **Status** (passed/failed/pending) 6. **Confidence** (0.0-1.0) This prevents ChatGPT from inventing its own column structure. ### Workflow Improvements **Current workflow has redundancy and gaps**: 1. Evidence Extraction → extracts text with page/document citations ✓ 2. Cross-Validation → attempts to extract SAME fields again, fails silently ✗ 3. Report Generation → uses evidence.json data ✓ 4. Human Review → has no structured view of what needs review ✗ **Proposed streamlined workflow**: 1. Evidence Extraction → extracts text AND structured fields (dates, IDs, tenure) 2. Cross-Validation → uses pre-extracted structured fields, reports what couldn't be checked 3. Evidence Matrix → shows all requirements with citations and validation status 4. Human Review → shows only requirements requiring judgment, with evidence summary ### Acceptance Criteria - [x] `/evidence-matrix` endpoint returns structured matrix with standard columns - [x] Requirements checklist includes `validation_type` classification - [x] Cross-validation reports what it DID check vs what it COULDN'T check - [x] Zero-validation cases are clearly flagged (not silent "success") - [x] All matrix views include: source document, page, description, status - [x] ChatGPT displays actual API data, not invented content (GPT instructions updated) ### Implementation Notes (2025-12-04) 1. **API Enhancement**: Added `extracted_value` and `section` fields to `/evidence-matrix` endpoint 2. **GPT Instructions**: Created `docs/specs/2025-12-03-gpt4.md` with: - Explicit endpoint routing: "evidence matrix" → `/evidence-matrix` - 8-column matrix format specification - "NEVER omit ANY columns" rule - API field-to-column mapping 3. **Commit**: `77c2152` - Add extracted_value to evidence-matrix API and update GPT instructions

Loading blob content...

Latest Blog Posts

Don't Use Large Strings as Cache Keys
By punkpeye on January 11, 2026.
markdown
node-js
cache
What are Claude Skills?
By punkpeye on January 10, 2026.
mcp
skills
How to Test MCP Streamable HTTP Endpoints Using cURL
By punkpeye on January 2, 2026.
tutorial
bash

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gaiaaiagent/regen-registry-review-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

task-34 - Evidence-matrix-and-validation-reliability.md•6.51 kB